A new method of construction waste classification based on two-level fusion

The automatic sorting of construction waste (CW) is an essential procedure in the field of CW recycling due to its remarkable efficiency and safety. The classification of CW is the primary task that guides automatic and precise sorting. In our work, a new method of CW classification based on two-level fusion is proposed to promote classification performance. First, statistical histograms are used to obtain global hue information and local oriented gradients, which are called the hue histogram (HH) and histogram of oriented gradients (HOG), respectively. To fuse these visual features, a bag-of-visual-words (BoVW) method is applied to code HOG descriptors in a CW image as a vector, and this process is named B-HOG. Then, based on feature-level fusion, we define a new feature to combine HH and B-HOG, which represent the global and local visual characteristics of an object in a CW image. Furthermore, two base classifiers are used to learn the information from the color feature space and the new feature space. Based on decision-level fusion, we propose a joint decision-making model to combine the decisions from the two base classifiers for the final classification result. Finally, to verify the performance of the proposed method, we collect five types of CW images as the experimental data set and use these images to conduct experiments on three different base classifiers. Moreover, we compare this method with other extant methods. The results demonstrate that our method is effective and feasible.


Introduction
As a global issue, construction waste (CW) obstructs the sustainable development process of the construction industry. For instance, in the European Union, the construction sector generates over 500 million metric tons of CW per year, accounting for 50% of the waste produced in the EU [1]. China, as a rapidly developing country, is suffering from the issue of increasing CW. In China, researchers have found that CW accounts for 30%-40% of the total urban waste [2]. As urbanization accelerates and the population increases, the amount of CW will continue to increase. Formerly, CW was collected and stacked in a landfill. It occupies land, pollutes groundwater and contaminates air. In Nigeria, Modu et al. [3] considered recycling methods as an effective and sustainable strategy for solid waste management. Recycling waste materials not only provides economic benefits but also minimizes environmental issues. In China, according to the composition of CW, experts estimate that 95% of CW can be reused. However, due to the limited a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 capturing system and the CW classification framework based on two-level fusion are described in section 2. In section 3, we evaluate the performance of the proposed method and compare it with others. Finally, we summarize our work and give some remarks on future research directions in section 4. The automatic sorting of CW can be realized with our method.

Capturing system
For image acquisition, we use an industrial camera to capture images. The camera is fixed on a metal frame located directly above the belt. To ensure that the objects on the belt can be captured, the height of the metal frame is variable. It can avoid confusing extraneous features, which will influence the experimental results. The data acquisition system is shown in Fig 1. The speed of the conveyer belt varies with the needs of users. The governor controls the speed of the motor. In this experiment, the conveyer belt runs at 100 mm per second. The industrial camera frame rate is 10 fps. The screen is used to display the CW images and experimental results.

Experimental materials
In this study, we mainly classify materials, such as concretes, bricks, plastics, foams, and woods, provided by a construction management department. We capture 125 pictures of each class by using the capturing system, and 100 samples of each type are used as training data. These samples were picked from the waste collection site and were not cleaned to ensure that each sample contained pollution. These images are labeled as an array of 0 and 1 values. Each label array has five elements. If the image is class M, the corresponding element is 1, and the others are 0. The sample images and corresponding labels of the five types of CW materials are shown in Fig 2. The appearance of the concrete is rough and white. Plastic, as a common CW, has a smooth surface. Foam can be different colors because its surface often attaches to other materials. Wood has a rectangular appearance. The color of bricks depends on their constituents. They are often orange or red. These materials are used in different construction stages.

Extracting visual features
(1) The global hue features. Color is a comparatively robust visual feature that is invariant to object scale and position [22]. Similarly, color information is a main feature that allows humans to recognize objects. Since the hue-saturation-value (HSV) model is consistent with the characteristics of the human visual system, the HSV feature is one of the main features for pattern recognition [23,24]. The HSV color space can be described as a conical geometry with three parameters, i.e., hue, saturation, and value [25]. In this method, we use the information from the hue channel to classify CW. We convert the image captured by the industrial camera from RGB space to HSV space. Next, we extract the hue information from the global perspective of the image. Finally, HH is used to describe the color information of the image. The algorithm flow for extracting HH is shown in Algorithm 1.
(2) The locally oriented gradient features. The histogram of oriented gradients (HOG) is a good feature for describing the shape and texture information of an object [26]. The process of computing HOG starts by dividing an image into cells and grouping the cells into blocks [23]. In each block, we calculate the gradient magnitude and gradient orientation, which are where m y and m x are the vertical and horizontal gradients counted by the 1D filter. M and θ represent the gradient magnitude and gradient orientation of pixel (x,y), respectively. The detailed processing is referenced in [27], and we name it EXTRACTHOG (�). The traditional HOG model of an image is constructed by HOG descriptors of blocks. It usually does not provide good classification performance due to redundant information. To reduce the dimensions and maintain the discrimination of HOG, a BoVW method is used to code HOG models of all blocks in an image. We name the new feature F BÀ HOG . Assume there are N classes of CW images and each class has M images for training, denoted as fI ðiÞ j ; i ¼ 1; . . . ; N; j ¼ 1; . . . ; Mg. The procedure of extracting F BÀ HOG is shown in Algorithms 2 and 3.  Step 1: Divide I t into L blocks.

Feature-level fusion
As mentioned before, the color feature and HOG descriptors are extracted from multiple views. They can provide comprehensive information for CW classification. To combine the information from multiple views, we construct a new feature based on feature-level fusion and name it the Color-HOG feature. The overall flow is illustrated in Fig 3. For one sample image I, we convert it to HSV color space and extract the hue information from the HSV model. Then, the HOG descriptors are extracted from the local perspective of the image and coded by

PLOS ONE
the BoVW method. Finally, the new Color-HOG feature F CH can be obtained through Eq (3).
where F HH and F BÀ HOG indicate the color feature and the coded HOG descriptors in image I, respectively. 1 and 2 are the weights to balance F HH�I and F BÀ HOG�I . In this experiment, 1 and 2 are set to 1.

Decision-level fusion
It is observed that bricks and woods have more color saliency than the other materials. This interesting phenomenon can help to distinguish salient materials from others. Therefore, two base classifiers are used to learn the information from the HH feature space and the Color-HOG feature space. We designed a joint decision model to combine the decisions from the feature spaces. The flowchart of CW classification is shown in Inspired by the fusion mechanism, the new joint decision-making model is illustrated in Eq (4). Let P H and P CH be the outputs of base classifiers, which are learned from the HH feature space and Color-HOG feature space, respectively. It is noteworthy that P H is a vector containing two probability values. The probabilities are used to determine whether the object belongs to the categories with salient color. P CH is a vector containing five elements, and each element represents the probability that the sample belongs to the category of CW. R is a transition matrix. P HCH is the result of the decision-making model.
where ω 1 and ω 2 are the fusion weights of P H and P CH . When ω 1 = 0, the evidence from the Color-HOG feature space is completely credible. In this case, we classify CW based on featurelevel fusion and name the method as the feature-level fusion-based CW classification (FLF).
When ω 1 = 1, the decision model indicates that only the HH feature is used to classify CW. In this case, since the HH feature can only effectively distinguish salient objects (i.e., bricks and woods) from the other materials, we cannot obtain an explicit label. For ω 1 2(0,1), the CW

PLOS ONE
classification method is based on feature-level fusion and decision-level fusion. We denote it as the two-level fusion-based CW classification method (TLF). It can be seen that FLF is a special case of TLF. In the next subsection, we investigate the influence of parameter ω 1 tuning on the TLF algorithm.

Salient features
The envelopes of HH features, B-HOG features, and Color-HOG features of 5 CW categories are displayed in Fig 5. The HH features of the CW images are shown in the first rank. We conclude that the probability curve of each category has two peaks, but the information distributions are different. The first peaks of the brick and wood materials are thin. However, the peaks of the plastic, concrete, and foam materials are wide. Therefore, HH features can be used to distinguish salient objects (i.e., bricks and woods) from others and as the basis for CW classification. The B-HOG features of 5 CW categories are shown in the second rank of

Effects of parameter tuning
The number of visual vocabularies is a significant parameter that affects the performance of the TFL method. We use the K-means clustering method to construct different sizes of BoVW and determine the optimal parameter. The accuracies of the TLF method with different sizes of visual vocabularies are shown in Table 1. Using a small number of visual vocabularies, different significant information may be combined into one cluster. As the size of the visual vocabulary increases, more CW details can be obtained, but a large vocabulary tends to overfit. The highest average accuracy of 96.32% is obtained on the BoVW with a size of 250. Therefore, the number of visual words in the proposed method is set at 250.
In the joint decision model, ω 1 and ω 2 represent the fusion weights of P H and P CH , which satisfy ω 2 +ω 1 = 1. To determine the optimal fusion weights, we conduct extensive experiments with different ω 1 . For ω 1 = 1, only color features are used to classify CW, and this is described in section 2.4. In this instance, CW can be divided into salient objects and other materials. For ω 1 2[0,1], the average accuracy curves of 5 CW categories are shown in Fig 6. When ω 1 <0.5, the evidence from the Color-HOG feature space plays a leading role. During this phase, the accuracy curves indicate that the classification accuracy increases as ω 1 increases. For ω 1 >0.5, the evidence from the color feature space plays a dominant role, and the curves show a downward trend. This indirectly demonstrates that the features from the Color-HOG feature space and HH feature space are equally important. In our work, we recommend taking ω 1 = 0.5 as a default value.

Evaluation of classification performance
In this section, the CW images described in section 2.2 are used to assess the performance of the proposed method. Five-fold cross validation is applied on all image sets. Confusion matrices are used to represent the performance of the proposed algorithm. The results of the proposed method on three base classifiers are shown in Tables 2-4. The base classifiers include the SVM [28], K-nearest neighbor (KNN) [29], and random forest (RF) [30] methods. The recall of plastics in the FLF method and TLF method based on the SVM classifier is 99.2%, which is higher than that of bricks, concretes, foams, and woods. Table 3 shows that the recall of plastic is the highest, which is 98.4%. Table 4 indicates that in the TLF method, the recall of

PLOS ONE
bricks is higher than that of plastics. Generally speaking, the recall values of most materials are higher than 90%. But, based on the RF classifier, the recall of concrete and foam is 89.6% and 87.2%, respectively. However, if we replace the base classifier, the recall of concretes and foams can be improved. In other words, our proposed methods still show superior performance. In addition, precision is also used to evaluate the FLF and TLF methods. Table 2 shows that the precision of the plastics in the FLF method and TLF method is 98.4% and 99.2%, respectively, which is higher than that of the other materials. Based on the KNN classifier, the precision of plastics is the highest, which is 98.4%. Although Table 4 indicates that the results of concretes and foams are slightly confused, the average accuracies of the FLF method and TLF method are 92.8% and 93%, respectively. This may be related to the performance of the base classifiers. Overall, we can see that our proposed CW classification method achieves good performance. Some of the results are visualized in Figs 7 and 8. The correct classification results are marked in green, and the incorrect classification results are marked in red.

Testing in various conditions
In industry, the environment of the construction jobsite is variable. The shaking of the capturing system produces noise. In addition, because CW contains demolished materials, the sizes of the objects may vary. To verify the robustness of our proposed approaches, we test the performance of the TLF and FLF methods with different levels of noise and scales. The results are reported in Fig 9. As the scale changes, the accuracy curves show a slight downward trend. However, when the size of the CW image changes from a scale of 0.5 to 1.5, the overall accuracy is higher than 80%. With an increase in noise, the accuracy curves show a downward trend. When the noise intensity is lower than 0.3%, the classification accuracy of the proposed methods is higher than 80%. However, at a construction jobsite, the types of noise vary. Generally, the proposed methods have better performance.

Comparison of classification performance
To verify the effectiveness of the new Color-HOG feature in our proposed approach, we conduct comparison experiments with other traditional features, i.e., HOG [31], LBP [32], SURF [33], and SIFT [34]. In the comparison experiments, the traditional features are also coded by visual words. As shown in Table 5, based on the SVM classifier, the average accuracies of LBP-BOW [14], SURF-BOW [15], SIFT-BOW [13], HOG-BOW [12], and Gabor Wavelets [19] are 71.36%, 83.36%, 83.84%, 59.04%, and 94.72%, respectively. Moreover, Tables 5-7 show that the FLF method and TLF method have more than 10% improvement compared with other methods except Gabor Wavelets. Unfortunately, on all tests, our proposed methods

PLOS ONE
don't always have the best results, such as the Gabor Wavelets method, which has the best results with the SVM classifier on Test2 datasets. However, if we replace the base classifier, our proposed methods can achieve the desired results. Therefore, the above phenomenon may relate to the potential of the base classifiers. Generally speaking, the average accuracy of our proposed method is better than that of the Gabor Wavelets method. In other words, our proposed methods still show competitive ability. In addition to the above-mentioned classic models based on handcrafted features, deep learning models have also become a trend. VGG-16 [9], ResNet-50 [6], and the Vision Transformer network (ViT) [8] are three classic deep learning models used for comparison with the TLF method. Table 8 shows that the precision of VGG-16, ResNet-50, and ViT is 94.0%, 95.5%, and 96.2%, respectively. In these deep learning models, ViT has the highest precision, but it is still lower than the TLF method. The precision of the TLF method is 2.32% higher

PLOS ONE
than that of the VGG-16 network. Compared with ResNet-50, the precision of the TLF method is improved by 0.82%. Table 9 shows that the recall of VGG-16, ResNet-50, and ViT is 93.6%, 95.2%, and 96.0%, respectively. Compared with VGG-16 and ResNet-50, the recall of ViT is the highest. However, the recall of the TLF method is 0.32% higher than that of ViT. Table 10 shows that the classification accuracy of VGG-16, ResNet-50, and ViT is 93.6%, 95.2%, and 96%, respectively. Compared with these deep learning models, the TLF method has higher accuracy. Although these deep learning models have achieved good performance on many datasets, the TLF method achieves better performance on the CW dataset.

Conclusions
With the increasing focus on preserving the environment, CW recycling has become an important topic. Sorting a large amount of CW precisely and quickly is an urgent problem. This research shows that it is feasible to classify CW by computer vision. Motivated by the characteristics of the human visual system, the TLF method is proposed to classify CW materials in this work. The TLF method is based on a joint model of feature-level fusion and decision-level fusion. For the former, a statistical histogram and a BoVW method are applied to capture color features and HOG descriptors from a CW image, respectively. Moreover, inspired by featurelevel fusion, a new feature named Color-HOG is constructed. For the latter, we fuse decisions from two base classifiers, which are learned from HH features and Color-HOG features. We name the model based on feature-level fusion as FLF, which is a special case of the TLF method. Compared with other state-of-the-art methods, the FLF method and TLF method have higher accuracy. The classification accuracy of the FLF method based on three base classifiers is 95.2%, 94.4%, and 92.96%, which is higher than that of the other state-of-the-art methods. Experiments demonstrate that Color-HOG is a robust feature for representing the discriminative characteristics of CW. Compared with the FLF method, the TLF method has higher accuracy: the accuracy of the TLF method based on the SVM classifier is 1.12% higher than that of the FLF method. In addition, we conduct experiments under various conditions. The experimental results also show that the proposed method has excellent performance under different conditions. In other words, the TLF method is an effective tool to promote the sorting and recycling of CW. This will be beneficial to reducing construction and CW management costs.